---
title: Glossary home
description: The Glossary provides brief definitions of terms relevant to the DataRobot platform.

---

# Glossary

The DataRobot glossary provides brief definitions of terms relevant to the DataRobot platform. These terms span all phases of machine learning, from data to deployment.

[All](#){ .md-button .selected data-type=all }
[Data](#){ .md-button data-type=data-prep }
[Modeling](#){ .md-button data-type=modeling }
[Time-aware](#){ .md-button data-type=time-aware }
[Predictions](#){ .md-button data-type=predictions }
[MLOps](#){ .md-button data-type=mlops }
[Generative AI](#){ .md-button data-type=gen-ai }


## A
-----------

#### Accuracy Over Space {: #accuracy-over-space data-category=modeling}
A model Leaderboard tab ([Evaluate > Accuracy Over Space](lai-insights)) and Location AI insight that provides a spatial residual mapping within an individual model.

#### Accuracy Over Time {: #accuracy-over-time data-category=time-aware;modeling }
A model Leaderboard tab ([Evaluate > Accuracy Over Time](aot)) that visualizes how predictions change over time.

#### ACE scores {: #ace-scores data-category=modeling}
Also known as Alternating Conditional Expectations. A univariate measure of correlation between the feature and the target. ACE scores detect non-linear relationships, but as they are univariate, they do not detect interaction effects.

#### Actuals {: #actuals data-category=predictions }
Actual values for an ML model that let you track its prediction outcomes. To generate accuracy statistics for a deployed model, you compare the model's predictions to real-world actual values for the problem. Both the prediction dataset and the actuals dataset must contain association IDs, which let you match up corresponding rows in the datasets to gauge the model's accuracy.

#### Advanced Tuning {: #advanced-tuning data-category=modeling }
The ability to manually set model parameters after the model build, supporting experimentation with parameter settings to improve model performance.

#### Aggregate image feature {: #aggregate-image-feature data-category=modeling }
Used with Visual AI, A set of image features where each individual element of that set is a constituent image feature. For example, the set of image features extracted from an image might include a set of features indicating:

1. The colors of the individual pixels in the image.
2. Where edges are present in the image.
3. Where faces are present in the image.

From the aggregate it may be possible to determine the impact of that feature on the output of a data analytics model and compare that impact to the impacts of the model's other features.

#### AI Catalog {: #ai-catalog data-category=data-prep }
A browsable and searchable collection of registered objects that contains definitions and relationships between various objects types. Items stored in the catalog include: data connections, data sources, data metadata.

#### AIM {: #aim data-category=modeling }
The second phase of [Exploratory Data Analysis](#eda-exploratory-data-analysis) (i.e., EDA2), that determines feature importances based on cross-correlation with the target feature. That data determines the “informative features” used for modeling during Autopilot.

#### Alternating Conditional Expectations {: #alternating-conditional-expectations data-category=modeling }
See [ACE scores](#ace-scores).

#### Anomaly detection {: #anomaly-detection data-category=modeling }
A form of [unsupervised learning](unsupervised/index) used to detect anomalies in data. Anomaly detection, also referred to as outlier or novelty detection, can be useful with data having a low percentage of irregularities or large amounts of unlabeled data. See also [unsupervised learning](#unsupervised-learning).

#### Apps {: #apps }
See [No-Code AI Apps](#no-code-ai-apps).

#### ARIMA (AutoRegressive Integrated Moving Average) {: #arima-autoregressive-integrated-moving-average data-category=time-aware;modeling }
A class of time series model that projects the future values of a series based entirely on the patterns of that series.

#### Association ID {: #association-id data-category=mlops }
An identifier that functions as a foreign key for your prediction dataset so you can later match up actual values (or "actuals") with the predicted values from the deployed model. An association ID is required for monitoring the accuracy of a deployed model.

#### AUC (Area Under the Curve) {: #auc-area-under-the-curve data-category=modeling }
A common error metric for binary classification that considers all possible thresholds and summarizes performance in a single value on the ROC Curve. It works by optimizing the ability of a model to separate the 1s from the 0s. The larger the area under the curve, the more accurate the model.

#### Augmented Intelligence {: #augmented-intelligence data-category=modeling }
DataRobot's enhanced approach to artificial intelligence, which expands current model building and deployment assistance practices. The DataRobot platform fully automates and governs the AI lifecycle from data ingest to model training and predictions to model-agnostic monitoring and governance. Guardrails ensure adherence to data science best practices when creating machine learning models and AI applications. Transparency across user personas and access to data wherever it resides avoids lock-in practices.

#### Automated Retraining {: #automated-retraining data-category=mlops }
Retraining strategies for MLOps that refresh production models based on a schedule or in response to an event (for example, a drop in accuracy or data drift). Automated Retraining also uses DataRobot’s AutoML create and recommend new challenger models. When combined, these strategies maximize accuracy and enable timely predictions.

#### AutoML (Automated Machine Learning) {: #automl-automated-machine-learning data-category=modeling }
A software system that automates many of the tasks involved in preparing a dataset for modeling and performing a model selection process to determine the performance of each with the goal of identifying the best performing model for a specific use case. Used for predictive modeling; see also [time series](#time-series) for forecasting.

#### Autopilot (full Autopilot) {: #autopilot-full-autopilot data-category=modeling }
The DataRobot "survival of the fittest" modeling mode that automatically selects the best predictive models for the specified target feature and runs them at ever-increasing sample sizes. In other words, it runs more models in the early stages on a small sample size and advances only the top models to the next stage. In full Autopilot, DataRobot runs models at 16% (by default) of total data and advances the top 16 models, then runs those at 32%. Taking the top 8 models from that run, DataRobot runs on 64% of the data (or 500MB of data, whichever is smaller). See also [Quick (Autopilot)](#quick-autopilot), [Comprehensive](#comprehensive), and [Manual](#manual).

#### AutoTS (Automated time series) {: #autots-automated-time-series data-category=time-aware;modeling}
A software system that automates all or most of the steps needed to build forecasting models, including featurization, model specification, model training, model selection, validation, and forecast generation. See also [time series](#time series).

#### Average baseline {: #average-baseline data-category=time-aware;modeling }
The average of the target in the [Feature Derivation Window](#feature-derivation-window); used in time series modeling.

## B
-----------

#### Backtesting {: #backtesting data-category=time-aware;modeling }
The time-aware equivalent of cross-validation. Unlike cross-validation, however, backtests allow you to select specific time periods or durations for your testing instead of random rows, creating “trials” for your data.

#### Baseline model {: #baseline-model data-category=time-aware;modeling }
Also known as a naive model. A simple model used as a comparison point to confirm that a generated ML or time series model is learning with more accuracy than a basic non-ML model.

For example, generated ML models for a regression project should perform better than a baseline model that predicts the mean or median of the target. Generated ML models for a time series project should perform better than a baseline model that predicts the future using the most recent actuals (i.e., using today's actual value as tomorrow's prediction).

For time series projects, baseline models are used to calculate the [MASE metric](opt-metric#mase) (the ratio of the [MAE metric](opt-metric#maeweighted-mae) over the baseline model).

#### Batch predictions {: #batch-predictions data-category=predictions;mlops }
A method of making predictions with large datasets, in which you pass input data and get predictions for each row; predictions are written to output files. Users can make batch predictions with MLOps via the **Predictions** interface or can use the Batch Prediction API for automating predictions. Schedule batch prediction jobs by specifying the prediction data source and destination and determining when the predictions will be run.

#### Bias Mitigation {: #bias-mitigation data-category=modeling }
Augments blueprints with a pre- or post-processing task intended to reduce bias across classes in a protected feature. Bias Mitigation is also a model Leadboard tab ([Bias and Fairness > Bias Mitigation](fairness-metrics#retrain-with-mitigation)) where you can apply mitigation techniques after Autopilot has finished.

#### Bias vs Accuracy {: #bias-vs-accuracy data-category=modeling }
A Leaderboard tab that generates a chart to show the tradeoff between predictive accuracy and fairness, removing the need to manually note each model's accuracy score and fairness score for the protected features.

#### "Blind History" {: #blind-history data-category=time-aware;modeling }
“Blind history," used in time-aware modeling, captures the gap created by the delay of access to recent data (e.g., “most recent” may always be one week old). It is defined as the period of time between the smaller of the values supplied in the Feature Derivation Window and the forecast point. A gap of zero means "use data up to, and including, today;" a gap of one means "use data starting from yesterday" and so on.

#### Blender {: #blender data-category=modeling }
A model that potentially increases accuracy by combining the predictions of between two and eight models. DataRobot can be configured to automatically create blender models as part of Autopilot, based on the top three regular Leaderboard models (for AVG, GLM, and ENET blenders). You can also create blenders manually (aka ensemble models).

#### Blueprint {: #blueprint data-category=modeling }
A graphical representation of the many steps involved in transforming input predictors and targets into a model. A blueprint represents the high-level end-to-end procedure for fitting the model, including any preprocessing steps, algorithms, and post-processing. Each box in a blueprint may represent multiple steps. You can view a graphical representation of a blueprint by clicking on a model on the Leaderboard. See also [user blueprints](#user-blueprints).

## C
-----------

#### "Can't operationalize" period {: #cant-operationalize-period data-category=time-aware;modeling }
The "can't operationalize" period, used in time series modeling, defines the gap of time immediately after the Forecast Point and extending to the beginning of the Forecast Window. It represents the time required for a model to be trained, deployed to production, and to start making predictions&mdash;the period of time that is too near-term to be useful. For example, predicting staffing needs for tomorrow may be too late to allow for taking action on that prediction.

#### Catalog {: #catalog data-category=data-prep}
See [AI Catalog](#ai-catalog).

#### Centroid {: #centroid data-category=modeling }
The center of a cluster generated using [unsupervised learning](#unsupervised-learning). A centroid is the multi-dimensional average of a cluster, where the dimensions are observations (data points).

#### CFDS (Customer Facing Data Scientist) {: #cfds-customer-facing-data-scientist }
A DataRobot employee responsible for the technical success of user and potential users. They assist with tasks like structuring data science problems to complete integration of DataRobot. CFDS are passionate about ensuring user success.

#### Challenger models {: #challenger-models data-category=mlops }
Models that you can compare to a currently deployed model (the "champion" model) to continue model comparison post-deployment. Submit a challenger model to shadow a deployed model and replay predictions made against the champion to determine if there is a superior DataRobot model that would be a better fit.

#### Champion model {: #champion-model data-category=mlops;time-aware; modeling }
A model recommended by DataRobot&mdash;for a deployment (predictions) or for time series segmented modeling.

In MLOps, you can replace the champion selected for a deployment yourself or you can set up [Automated Retraining](set-up-auto-retraining), where DataRobot compares challenger models with the champion model and replaces the champion model if a challenger outperforms the champion.

In the segmented modeling workflow, DataRobot builds a model for each segment. DataRobot recommends the best model for each segment&mdash;the segment champion. The segment champions roll up into a Combined Model. For each segment, you can select a different model as champion, which is then used in the Combined Model.

#### Channel {: #channel data-category=modeling }
The connection between an output port of one module and an input port of another module. Data flows from one module's output port to another module's input port via a channel, represented visually by a line connecting the two.

#### Classification {: #classification data-category=modeling }
A type of prediction problem that classifies values into discrete, final outcomes or classes. _Binary classification_ problems are those datasets in which what you are trying to predict can be one of two classes (for example, "yes" or "no"). _Multiclass classification_ is a classification problem that results in more than two outcomes (for example, "buy", "sell", or "hold"). _Unlimited multiclass_ is the ability to handle projects with a target feature containing an unlimited number of classes, with support for  both a high threshold of individual classes and multiclass aggregation to support an unlimited number of classes above the threshold. See also [regression](#regression).

#### Clustering {: #clustering data-category=modeling }
A form of [unsupervised learning](#unsupervised-learning) used to group similar data and identify natural segments.

#### Coefficients {: #coefficients data-category=modeling }
A model Leaderboard tab ([Describe > Coefficients](coefficients)) that provides a visual indicator of information that can help you refine and optimize your models.

#### Combined Model { #combined-model data-category=time-aware;modeling }
The final model generated in a time series segmented modeling workflow. With segmented modeling, DataRobot builds a model for each segment and combines the segment champions into a single Combined Model that you can deploy.

#### Common event {: #common-event data-category=time-aware;modeling }
A data point is a common event if it occurs in a majority of weeks in data (for example, regular business days and hours would be common, but an occasional weekend data point would be uncommon).

#### Compliance documentation {: #compliance-documentation data-category=modeling }
Automated model development documentation that can be used for regulatory validation. The documentation provides comprehensive guidance on what constitutes effective model risk management.

#### Composable ML {: #composable-ml data-category=modeling }
A code-centric feature, designed for data scientists, that allows applying custom preprocessing and modeling methods to create a blueprint for model training. Using built-in and [custom tasks](#custom-task), you can compose and then integrate the new blueprint with other DataRobot features to augment and improve machine learning pipelines.

#### Comprehensive {: #comprehensive data-category=modeling }
A modeling mode that runs all Repository blueprints on the maximum Autopilot sample size to ensure more accuracy for models.

#### Computer vision {: #computer-vision data-category=modeling }
Use of computer systems to analyze and interpret image data, used with Visual AI. Computer vision tools generally use models that incorporate principles of geometry to solve specific problems within the computer vision domain. For example, computer vision models may be trained to perform object recognition (recognizing instances of objects or object classes in images), identification (identifying an individual instance of an object in an image), detection (detecting specific types of objects or events in images), etc.

#### Computer vision tools/techniques {: #computer-vision-toolstechniques data-category=modeling }
Tools&mdash;for example, models, systems&mdash;that perform image preprocessing, feature extraction, and detection/segmentation functions.

#### Confusion matrix {: #confusion-matrix data-category=modeling }
A table that reports true versus predicted values. The name “confusion matrix” refers to the fact that the matrix makes it easy to see if the model is confusing two classes (consistently mislabeling one class as another class). The confusion matrix is available as part of the ROC Curve, Eureqa, and Confusion Matrix for multiclass model visualizations in DataRobot.

#### Constraints {: #constraints data-category=modeling }
A model Leaderboard tab ([Describe > Constraints](monotonic)) that allows you to review monotonically constrained features if feature constraints were configured in Advanced Options prior to modeling.

#### Credentials {: #credentials data-category=data-prep }
Information used to authenticate and authorize actions against data connections. The most common connection is through username and password, but alternate authentication methods include LDAP, Active Directory, and Kerberos.

#### Cross-Class Accuracy {: #cross-class-accuracy data-category=modeling }
A model Leaderboard tab ([Bias and Fairness > Cross-Class Accuracy](cross-acc)) that helps to shows why the model is biased, and where in the training data it learned the bias from. [Bias and Fairness settings](fairness-metrics) must be configured.

#### Cross-Class Data Disparity {: #cross-class-data-disparity data-category=modeling }
A model Leaderboard tab ([Bias and Fairness > Cross-Class Data Disparity](cross-data)) that calculates, for each protected feature, evaluation metrics and ROC curve-related scores segmented by class. [Bias and Fairness settings](fairness-metrics) must be configured.

#### Cross-Validation {: #cross-validation data-category=modeling }
Also known as CV. A type of validation partition that is run to test (validate) model performance. Using subsets ("folds") of the validation data, DataRobot creates one model per fold, with the data assigned to that fold used for validation and the rest of the data used for training. By default, DataRobot uses five-fold cross-validation and presents the mean of those five scores on the Leaderboard. See also [validation](#validation).

#### Custom inference models {: #custom-inference-models data-category=mlops }
User-created, pre-trained models uploaded as a collection of files via the Custom Model Workshop. Upload a model artifact to create, test, and deploy custom inference models to the centralized deployment hub in DataRobot. An inference model can have a predefined input/output schema or it can be unstructured. To customize prior to model training, use [custom tasks](#custom-task).

#### Custom model workshop {: #custom-model-workshop data-category=mlops }
In the [Model Registry](#model-registry), a location where you can upload user-created, pre-trained models as a collection of files. You can use these model artifacts to create, test, and deploy custom inference models to centralized deployment hub in DataRobot.

#### Custom task {: #custom-task data-category=mlops }
A data transformation or ML algorithm, for example, XGBoost or One-hot encoding, that can be used as a step in an ML blueprint inside DataRobot and used for model training. Tasks are written in Python or R and are added via the Custom Model Workshop. Once saved, the task can be used when modifying a blueprint with [Composable ML](#composable-ml). To deploy a pre-trained model where re-training is not required, use [custom inference models](#custom-inference-models).

#### CV {: #cv data-category=modeling }
See [Cross Validation](#cross-validation).

## D
-----------

#### Data drift {: #data-drift data-category=mlops }
The difference between values in new inference data used to generate predictions for models in production and the training data initially used to train the deployed model. Predictive models learn patterns in training data and use that information to predict target values for new data. When the training data and the production data change over time, causing the model to lose predictive power, the data surrounding the model is said to be drifting. Data drift can happen for a variety of reasons, including data quality issues, changes in feature composition, and even changes in the context of the target variable.

#### Data management {: #data-management data-category=data-prep}
The umbrella term related to loading, cleaning, transforming, and storing data within DataRobot. It also refers to the practices that companies follow when collecting, storing, using, and deleting data.

#### Data preparation {: #data-preparation data-category=data-prep}
The process of transforming raw data to the point where it can be run through machine learning algorithms to uncover insights or make predictions. Also called “data preprocessing.”

#### Data Quality Handling Report {: #data-quality-handling-report data-category=data-prep}
A model Leaderboard tab ([Describe > Data Quality Handling Report](dq-report)) that analyzes the training data and provides the following information for each feature: feature name, variable type, row count, percentage, and data transformation information.

#### DataRobot User Model (DRUM) {: #datarobot-user-model-drum data-category=mlops }
A tool that allows you to test Python, R, and Java custom models and tasks locally. The test allows you to verify that a custom model can successfully run and make predictions in DataRobot before uploading it.

#### DataRobot University (DRU) {: #datarobot-university-dru }
Provides practical data science education to solve business problems. <a target="_blank" href="https://university.datarobot.com/">DRU</a> offers guided learning, self-paced and instructor-led courses, and labs, as well as certification programs, across many topics and skill levels.

#### Dataset {: #dataset data-category=data-prep;modeling}
Data, a file or the content of a data source, at a particular point in time. A data source can produce multiple datasets; an AI Catalog dataset has exactly one data source. In [AI Catalog](#ai-catalog), a dataset is materialized data that is stored with a catalog version record. There may be multiple catalog version records associated with an entity, indicating that DataRobot has reloaded or refreshed the data. The older versions are stored to support existing projects, new projects use the most recent version. A dataset can be in one of two states:

* A "snapshotted" (or materialized) dataset is an immutable snapshot of data that has previously been retrieved and saved.
* A “remote” (or unmaterialized) dataset has been configured with a location from which data is retrieved on-demand (AI Catalog).

#### Data connection {: #data-connection data-category=data-prep}
A configured connection to a database&mdash;it has a name, a specified driver, and a JDBC URL. You can register data connections with DataRobot for ease of re-use. A data connection has one connector but can have many data sources.

#### Data source {: #data-source data-category=data-prep}
A configured connection to the backing data (the location of data within a given endpoint). A data source specifies, via SQL query or selected table and schema data, which data to extract from the data connection to use for modeling or predictions. Examples include the path to a file on HDFS, an object stored in S3, and the table and schema within a database. A data source has one data connection and one connector but can have many datasets. It is likely that the features and columns in a datasource do not change over time, but that the rows within change as data is added or deleted.

#### Data stage {: #data-stage data-category=data-prep}
Intermediary storage that supports multipart upload of large datasets, reducing the chance of failure when working with large amounts of data. Upon upload, the dataset is uploaded in parts to the data stage, and once the dataset is whole and finalized, it is pushed to the AI Catalog or Batch Predictions. At any time after the first part is uploaded to the data stage, the system can instruct Batch Predictions to use the data from the data stage to fill in predictions.

#### Data store {: #data-store data-category=data-prep}
A general term used to describe a remote location where your data is stored. A data store may contain one or more databases, or one or more files of varying formats.

#### Data/time partitioning {: #data-time-partitioning data-category=time-aware;modeling }
The only valid partitioning method for time-aware projects. With date/time, rows are assigned to [backtests](#backtesting) chronologically instead of, for example, randomly. Backtests are configurable, including number, start and end times, and sampling method.

#### Deep learning {: #deep-learning data-category=modeling }
A set of algorithms that run data through several “layers” of neural network algorithms, each of which passes a simplified representation of the data to the next layer. Deep learning algorithms are essential to Visual AI capabilities, and their processing can be viewed from the Training Dashboard visualization.

#### Deployment inventory {: #deployments-inventory data-category=mlops }
The central hub for managing deployments. Located on the Deployments page, the inventory serves as a coordination point for stakeholders involved in operationalizing models. From the inventory, you can monitor deployed model performance and take action as necessary, managing all actively deployed models from a single point.

#### Detection/segmentation {: #detectionsegmentation data-category=modeling }
A computer vision technique that involves the selection of a subset of the input image data for further processing (for example, one or more images within a set of images or regions within an image).

#### Downloads tab {: #downloads-tab data-category=modeling }
A model Leaderboard tab ([Predict > Downloads](download)) where you can download model artifacts.

#### Downsampling {: #downsampling data-category=modeling;data-prep }
See [Smart downsampling](#smart-downsampling).

#### Driver {: #driver data-category=data-prep }
The software that allows the DataRobot application to interact with a database; each data connection is associated with one driver (created and installed by your administrator). The driver configuration saves the JAR file storage location in DataRobot and any additional dependency files associated with the driver. DataRobot supports JDBC drivers.

#### Dynamic dataset {: #dynamic-dataset data-category=data-prep}
A dynamic dataset is a "live" connection to the source data, however, DataRobot samples the data for profile statistics (EDA1). The catalog stores a pointer to the data and pulls it upon request, for example, when you create a project.

## E
-----------

#### EDA (Exploratory Data Analysis) {: #eda-exploratory-data-analysis data-category=modeling;data-prep }
The DataRobot approach to analyzing and summarizing the main characteristics of a dataset. Generally speaking, there are two stages of EDA:

* EDA1 provides summary statistics based on a sample of data. In EDA1, DataRobot counts, categorizes, and applies automatic feature transformations (where appropriate) to data.
* EDA2 is a recalculation of the the statistics collected in EDA1 but using the entire dataset, excluding holdout. The results of this analysis are the criteria used for model building.

#### Ensemble models {: #ensemble-models data-category=modeling }
See [blender](#blender).

#### Environment {: #environment }
A Docker container where a custom task runs.

#### ESDA {: #esda data-category=modeling;data-prep }
Exploratory Spatial Data Analysis (ESDA) is the exploratory data phase for Location AI. DataRobot provides a variety of tools for conducting ESDA within the DataRobot AutoML environment, including geometry map visualizations, categorical/numeric thematic maps, and smart aggregation of large geospatial datasets.

#### Eureqa {: #eureqa data-category=modeling }
Model blueprints for Eureqa generalized additive models (Eureqa GAM), Eureqa regression, and Eureqa classification models. These blueprints use a proprietary Eureqa machine learning algorithm to construct models that balance predictive accuracy against complexity.

#### EWMA (Exponentially Weighted Moving Average) {: #ewma-exponentially-weighted-moving-average data-category=modeling }
A moving average that places a greater weight and significance on the most recent data points, measuring trend direction over time. The "exponential" aspect indicates that the weighting factor of previous inputs decreases exponentially. This is important because otherwise a very recent value would have no more influence on the variance than an older value.

#### External stage {: #external-stage data-category=data-prep}
An external stage is a cloud location outside of the Snowflake environment used for loading and unloading data for Snowflake. The cloud location can be either Amazon S3 or Microsoft Azure storage.

## F
-----------

#### Fairness score {: #fairness-score data-category=modeling }
A numerical computation of model fairness against the protected class, based on the underlying fairness metric.

#### Fairness Threshold {: #fairness-threshold data-category=modeling }
The measure of whether a model performs within appropriate fairness bounds for each protected class. It does not affect the fairness score or performance of any protected class.

#### Fairness Value {: #fairness-value data-category=modeling }
Fairness scores normalized against the most favorable protected class (i.e., the class with the highest fairness score).

#### Favorable Outcome {: #favorable-outcome data-category=modeling }
A value of the target that is treated as the favorable outcome for the model, used in bias and fairness modeling. Predictions from a binary classification model can be categorized as being a favorable outcome (i.e., good/preferable) or an unfavorable outcome (i.e., bad/undesirable) for the protected class.

#### FDW {: #fdw data-category=time-aware;modeling }
See [Feature Derivation Window](#feature-derivation-window).

#### Feature {: #feature data-category=data-prep}
A column in a dataset, also called "variable" or "feature variable." The target feature is the name of the column in the dataset that you would like to predict.

#### Feature Derivation Window {: #feature-derivation-window data-category=time-aware;modeling;data-prep }
Also known as FDW; used in time series modeling. A rolling window of past values that models use to derive features for the modeling dataset. Consider the window relative to the [Forecast Point](#forecast-point), it defines the number of recent values the model can use for forecasting.

#### Feature Discovery {: #feature-discovery data-category=data-prep }
A DataRobot capability that discovers and generates new features from multiple datasets, eliminating the need to perform manual feature engineering to consolidate multiple datasets into one. A relationship editor visualizes these relationships and the end product is additional, derived features that result from the created linkages.  

#### Feature Effects {: #feature-effects data-category=modeling }
A model Leaderboard tab ([Understand > Feature Effects](feature-effects)) that shows the effect of changes in the value of each feature on the model’s predictions. It displays a graph depicting how a model "understands" the relationship between each feature and the target, with the features sorted by [Feature Impact](#feature-impact).

#### Feature engineering {: #feature-engineering data-category=modeling;time-aware;data-prep }
The generation of additional features in a dataset, which as a result, improve model accuracy and performance. Time series and Feature Discovery both rely on feature engineering as the basis of their functionality.

#### Feature extraction {: #feature-extraction data-category=modeling }
Models that perform image preprocessing (or image feature extraction and image preprocessing) are also known as “image feature extraction models” or “image-specific models.”

#### Feature Extraction and Reduction (FEAR) {: #feature-extraction-and-reduction-fear data-category=time-aware;modeling }
The feature generation process for time series modeling (e.g., lags, moving averages). It extracts new features (now) and then reduces the set of extracted features (later). See time series feature derivation.

#### Feature Impact {: #feature-impact data-category=modeling }
A measurement that identifies which features in a dataset have the greatest effect on model decisions. In DataRobot, the measurement is reported as a visualization available from the Leaderboard.

#### Feature imputation {: #feature-imputation data-category=time-aware;modeling }
A mechanism in time series modeling that uses forward filling to enable imputation for all features (target and others) when using the time series data prep tool. This results in a dataset with no missing values (with the possible exception of leading values at the start of each series where there is no value to forward fill).

#### Feature list {: #feature-list data-category=modeling;data-prep}
A subset of features from a dataset used to build models. DataRobot creates several lists during EDA2 including all informative features, informative features excluding those with a leakage risk, a raw list of all original features, and a reduced list. Uses can create project-specific lists as well.

#### Fitting {: #fitting data-category=modeling }
See [model fitting](#model-fitting).

#### Forecast Distance {: #forecast-distance data-category=time-aware;modeling }
A unique time step&mdash;a relative position&mdash;within the Forecast Window in a time series modeling project. A model outputs one row for each Forecast Distance.

#### Forecast Point {: #forecast-point data-category=time-aware;modeling }
In time series modeling, the point you are making a prediction from; a relative time "if it was now..."; DataRobot trains models using all potential forecast points in the training data. In production, it is typically the most recent time.

#### Forecast vs Actual {: #forecast-vs-actual data-category=time-aware;modeling }

A model Leaderboard tab ([Evaluate > Forecast vs Actual](fore-act)) commonly used in time series projects that allows you to compare how different predictions behave from different forecast points to different times in the future. Although similar to the [Accuracy Over Time](#accuracy-over-time) chart, which displays a single forecast at a time, the Forecast vs Actual chart shows multiple forecast distances in one view.

#### Forecast Window {: #forecast-window data-category=time-aware;modeling }
Also known as FW; used in time series modeling. Beginning from the Forecast Point, defines the range (the Forecast Distance) of future predictions&mdash;"this is the range of time I care about." DataRobot then optimizes models for that range and ranks them on the Leaderboard on the average across that range.

#### Forecasting {: #forecasting data-category=time-aware;modeling }
Predictions based on time, into the future; use inputs from recent rows to predict future values. Forecasting is a subset of predictions, using trends in observation to characterize expected outcomes or expected responses.

{% include 'includes/genai/foundational-include.md' %}

#### Frozen run {: #frozen-run data-category=modeling }
A process that “freezes” parameter settings from a model’s early, small sample size-based run. Because parameter settings based on smaller samples tend to also perform well on larger samples of the same data.

#### FW {: #fw }
See [Forecast Window](#forecast-window).

## G
-----------

{% include 'includes/genai/genai-include.md' %}

#### Governance lens {: #governance-lens data-category=mlops }
A filtered view of DataRobot's deployment inventory on the Deployments page, summarizing the social and operational aspects of a deployment. These include the deployment owner, how the model was built, the model's age, and the humility monitoring status.

#### GPU (graphics processing unit) {: #gpu-graphics-processing-unit data-category=modeling }
A mechanism for processing computational tasks. GPUs are GPUs are highly optimized to do mathematical calculations and great at parallelism, but only for less complex tasks. Deep learning specifically benefits from that since it's mainly batches of matrix multiplication, and these can be parallelized very easily.

#### Grid search {: #grid-search data-category=modeling }
An exhaustive search method used for hyperparameters.

## H
-----------

#### Holdout {: #holdout data-category=modeling }
A subset of data that is unavailable to models during the training and validation process. Use the Holdout score for a final estimate of model performance only after you have selected your best model. See also [Validation](#validation).

#### Humility {: #humility data-category=mlops }
A user-defined set of rules for deployments that allow models to be capable of recognizing, in real-time, when they make uncertain predictions or receive data they have not seen before. Unlike data drift, model humility does not deal with broad statistical properties over time&mdash;it is instead triggered for individual predictions, allowing you to set desired behaviors with rules that depend on different triggers.

## I
-----------

#### Image data {: #image-data data-category=modeling;data-prep }
A sequence of digital images (e.g., video), a set of digital images, a single digital image, and/or one or more portions of any of these&mdash;data used as part of Visual AI. A digital image may include an organized set of picture elements (“pixels”) stored in a file. Any suitable format and type of digital image file may be used, including but not limited to raster formats (e.g., TIFF, JPEG, GIF, PNG, BMP, etc.), vector formats (e.g., CGM, SVG, etc.), compound formats (e.g., EPS, PDF, PostScript, etc.), and/or stereo formats (e.g., MPO, PNS, JPS).

#### Image preprocessing {: #image-preprocessing data-category=modeling }
A computer vision technique, part of Visual AI. Some examples include image re-sampling, noise reduction, contrast enhancement, and scaling (e.g., generating a scale space representation). Extracted features may be:

* Low-level: raw pixels, pixel intensities, pixel colors, gradients, textures, color histograms, motion vectors, edges, lines, corners, ridges, etc.
* Mid-level: shapes, surfaces, volumes, etc.
* High-level: objects, scenes, events, etc.

#### Inference data {: #inference-data data-category=predictions }
Data that is scored by applying an algorithmic model built from a historical dataset in order to uncover practical insights. See also [Scoring data](#scoring-data).

#### In-sample predictions {: #in-sample-predictions data-category=predictions }
Models trained on data outside of the training set (i.e., Validation and potentially Holdout). DataRobot uses 64% of the training set by default. When models are trained with a  sample size above 64%, DataRobot marks the _Validation_ score with an asterisk to indicate that some in-sample predictions were used for that score. If you train above 80%, the _Holdout_ score is also asterisked. Compare to [stacked](#stacked-predictions) (out-of-sample) predictions.

#### Irregular data {: #irregular-data data-category=time-aware;modeling;data-prep }
Data in which no consistent spacing and no time step is detected. Used in time-aware modeling.

## K
-----------

#### KA {: #ka }
See [Known in advance features](#known-in-advance-features).

#### Known in advance features {: #known-in-advance-features data-category=time-aware;modeling }
Also known as KA; used in time series modeling. A variable for which you know the value in advance and does not need to be lagged, such as holiday dates. Or, for example, you might know that a product will be on sale next week and so you can provide the pricing information in advance.

## L
-----------

#### Large language model (LLM) {: #large-language-model-llm data-category=gen-ai }
An algorithm that uses deep learning techniques and large datasets to understand, summarize, generate, and predict new content.

#### Leaderboard {: #leaderboard data-category=modeling }
The list of trained blueprints (models) for a project, ranked according to a project metric.

#### Leakage {: #leakage data-category=data-prep}
See [target leakage](#target-leakage).

#### Learning Curves {: #learning-curves data-category=modeling }
A graph to help determine whether it is worthwhile to increase the size of a dataset. The Learning Curve graph illustrates, for the top-performing models, how model performance varies as the sample size changes.

#### Lift Chart {: #lift-chart data-category=modeling }
Depicts how well a model segments the target population and how capable it is of predicting the target to help visualize model effectiveness.

#### Linkage keys {: #linkage-keys data-category=data-prep }
(Feature Discovery) The features in the primary dataset used as keys to join and create relationships.

{% include 'includes/genai/llm-misc-include.md' %}

#### Location AI {: #location-ai data-category=modeling }
DataRobot's support for geospatial analysis by natively ingesting common geospatial formats and recognizing coordinates, allowing [ESDA](#esda), and providing spatially-explicit modeling tasks and visualizations.

#### Location AI {: #location-ai }
DataRobot's support for geospatial analysis by natively ingesting common geospatial formats and recognizing coordinates, allowing [ESDA](#esda), and providing spatially-explicit modeling tasks and visualizations.

#### Log {: #log data-category=modeling }
A model Leaderboard tab ([Describe > Log](log)) that displays the status of successful operations with green INFO tags, along with information about errors marked with red ERROR tags.

## M
-----------

#### Machine Learning Operations {: #machine-learning-operations }
See [MLOps](#mlops-maching-learning-operations).

#### Majority class {: #majority-class data-category=modeling;data-prep }
If you have a categorical variable (e.g., `true`/`false` or `cat`/`mouse` ), the value that's more frequent is the majority class. For example, if a dataset has 80 rows of value `cat` and 20 rows of value `mouse`, then `cat` is the majority class. See also [minority class](#minority-class).

#### Make Predictions tab {: #make-predictions-tab }
A model Leaderboard tab ([Predict > Make Predictions](predict)) that allows you to make predictions before deploying a model to a production environment.

#### Management agent {: #management-agent data-category=mlops }
A downloadable client included in the MLOps agent tarball (accessed via **Developer Tools**) that allows you to manage external models (i.e., those running outside of DataRobot MLOps). This tool provides a standard mechanism to automate model deployment to any type of infrastructure. The management agent sends periodic updates about deployment health and status via the API and reports them as MLOps events on the Service Health page.

#### Manual {: #manual data-category=modeling }
A modeling mode that causes DataRobot to complete EDA2 and prepare data for modeling, but does not execute model building. Instead, users select specific models to build from the model Repository.

#### Materialized {: #materialized data-category=data-prep }
Data that DataRobot has pulled from the data asset and is currently keeping a copy of in the catalog. See also [snapshot](#snapshot) and [unmaterialized](#unmaterialized) data.

#### Metadata {: #metadata data-category=data-prep }
Details of the data asset, such as creation and modification dates, number and types of features, snapshot status, and more.

#### Metric {: #metric }
See [optimization metric](#optimization-metric).

#### Minority class {: #minority-class data-category=modeling;data-prep }
If you have a categorical variable (e.g., `true`/`false` or `cat`/`mouse` ), the value that's less frequent is the minority class. For example, if a dataset has 80 rows of value `cat` and 20 rows of value `mouse`, then `mouse` is the minority class. See also [majority class](#majority-class).

#### MLOps (Machine Learning Operations) {: #mlops-maching-learning-operations data-category=mlops }
A scalable and governed means to rapidly deploy and manage ML applications in production environments.

#### MLOps agent {: #mlops-agent data-category=mlops }
One of two downloadable clients included in the MLOps agent tarball (accessed via **Developer Tools**) that allows you to monitor and manage external models (i.e., those running outside of DataRobot MLOps). See [monitoring agent](#monitoring-agent) and [management agent](#management-agent).

#### Models/modeling {: #modelsmodeling data-category=modeling }
A trained ML pipeline, capable of scoring new data. Models&mdash;descriptive, predictive, prescriptive&mdash;form the basis of data analysis. Modeling extracts insights from data that you can then use to make better business decisions. Algorithmic models tell you which outcome is likely to hold true for your target variable based on your training data. They construct a representation of the relationships and tease out patterns between all the different features in your dataset that you can apply to similar data you collect in the future, allowing you to make decisions based on those patterns and relationships.

#### Model Comparison {: #model-comparison data-category=modeling }
A Leaderboard tab that allows you to compare two models using different evaluation tools, helping identify the model that offers the highest business returns or candidates for blender models.

#### Model fitting {: #model-fitting data-category=modeling }
A measure of how well a model generalizes similar data to the data on which it was trained. A model that is well-fitted produces more accurate outcomes. A model that is overfitted matches the data too closely. A model that is underfitted doesn’t match closely enough.

#### Model Info {: #model-info data-category=modeling }
A model Leaderboard tab ([Describe > Model Info](model-info)) that displays an overview for a given model, including model file size, prediction time, and sample size.

#### Model package {: #model-package data-category=mlops }
Archived model artifacts with associated metadata stored in the Model Registry. Model packages can be created manually or automatically, for example, through the deployment of a custom model. You can deploy, share, and permanently archive model packages.

#### Model Registry {: #model-registry data-category=mlops }
An organizational hub for the variety of models used in DataRobot. Models are registered as deployment-ready model packages; the registry lists each package available for use. Each package functions the same way, regardless of the origin of its model. The Model Registry also contains the Custom Model Workshop where you can create and deploy custom models. Model packages can be created manually or automatically depending on the type of model.

#### Model scoring {: #model-scoring data-category=modeling }
The process of applying an optimization metric to a partition of the data and assigning a numeric score that can be used to evaluate a model performance.

#### Modeling dataset {: #modeling-dataset data-category=time-aware;modeling;data-prep }
A transform of the original dataset that pre-shifts data to future values, generates lagged time series features, and computes time-series analysis metadata. Commonly referred to as feature derivation, it is used by time series but not OTV. See the [time series feature engineering reference](feature-eng) for a list of operators used and feature names created by the feature derivation process. See also [FEAR](#feature-extraction-and-reduction-fear).

#### Modeling mode {: #modeling-mode data-category=modeling }
A setting that controls the sample percentages of the training set that DataRobot uses to build models. DataRobot offers four modeling modes: [Autopilot](#autopilot-full-autopilot), [Quick](#quick-autopilot) (the default), [Manual](#manual), and [Comprehensive](#comprehensive).

#### Monitoring agent { data-category=mlops }
A downloadable client included in the MLOps agent tarball (accessed via **Developer Tools**) that allows you to monitor external models (i.e., those running outside of DataRobot MLOps). With this functionality, predictions and information from these models can be reported as part of deployments. You can use this tool to monitor accuracy, data drift, prediction distribution, latency, and more, regardless of where the model is running.

#### Monotonic modeling {: #monotonic-modeling data-category=modeling }
A method to force certain XGBoost models to learn only monotonic (always increasing or always decreasing) relationships between specific features and the target.

#### Multiclass {: #multiclass }
See [classification](#classification).

#### Multilabel {: #multilabel data-category=modeling }
A classification task where each row in a dataset is associated with one, several, or zero labels. Common multilabel classification problems are text categorization (a movie is both "crime" and "drama") and image categorization (an image shows a house and a car).

#### Multimodal {: #multimodal data-category=modeling }
A model type that supports multiple var types at the same time, in the same model.

#### Multiseries {: #multiseries data-category=time-aware;modeling;data-prep }
Datasets that contain multiple time series (for example, to forecast the sales of multiple stores) based on a common set of input features.

## N
-----------

#### Naive model {: #naive-model }
See [baseline model](#baseline-model).

#### No-Code AI Apps {: #no-code-ai-apps }
A no-code interface to create AI-powered applications that enable core DataRobot services without having to build models and evaluate their performance. Applications are easily shared and do not require consumers to own full DataRobot licenses in order to use them.


#### N-gram {: #n-gram data-category=modeling }
A sequence of words, where N is the number of words. For example, "machine learning" is a 2-gram. Text features are divided into n-grams to prepare for natural language processing (NLP).

#### Notebook {: #notebook data-category=modeling }
An interactive, computational environment that hosts code execution and rich media. DataRobot provides its own in-app environment to create, manage, and execute Jupyter-compatible hosted notebooks.


#### Nowcasting {: #nowcasting data-category=time-aware;modeling }
A method of time series modeling that predicts the current value of a target based on past and present data. Technically, it is a forecast window in which the start and end times are 0 (now).

## O
-----------

#### Offset {: #offset data-category=modeling }
Feature(s) that should be treated as a fixed component for modeling (coefficient of 1 in generalized linear models or gradient boosting machine models). Offsets are often used to incorporate pricing constraints or to boost existing models.

#### Optimization metric {: #optimization-metric data-category=modeling }
An error metric used in DataRobot to determine how well a model predicts actual values. After you choose a target feature, DataRobot selects an optimization metric based on the modeling task.

#### Ordering feature {: #ordering-feature data-category=time-aware }
The primary date/time feature that DataRobot will use for modeling. Options are detected during [EDA1](#eda).

#### OTV {: #otv data-category=time-aware;modeling }
Also known as out-of-time validation. A method for modeling time-relevant data. With OTV you are not forecasting, as with [time series](#autots-automated-time-series). Instead, you are predicting the target value on each individual row.

#### Overfitting {: #overfitting data-category=modeling }
A situation in which a model fits its training data too well and therefore loses its ability to perform accurately against unseen data. This happens when a model trains too long on the training data and learns (and models on) its "noise," making the model unable to generalize.

## P
-----------

#### Partition {: #partition data-category=modeling }
Segments of training data, broken down to maximize accuracy. The segments (splits) of the dataset. See also [training](#training-data), [validation](#validation), [cross-validation](#cross-validation), and [holdout](#holdout).

#### Per-Class Bias {: #per-class-bias data-category=modeling }
A model Leadboard tab ([Bias and Fairness > Per-Class Bias](per-class)) that helps to identify if a model is biased, and if so, how much and who it's biased towards or against. [Bias and Fairness settings](fairness-metrics) must be configured.

#### PID (project identifier) {: #pid-project-identifier data-category=modeling }
An internal identifier used for uniquely identifying a project.

#### PII {: #pii data-category=modeling }
Personal identifiable information, including name, pictures, home address, SSN or other identifying numbers, birth date, and more. DataRobot automates the detection of specific types of personal data to provide a layer of protection against the inadvertent inclusion of this information in a dataset.

{% include 'includes/genai/playground-include.md' %}

#### Portable prediction server (PPS) {: #portable-prediction-server-pps data-category=mlops }
A DataRobot execution environment for DataRobot model packages (`.mlpkg` files) distributed as a self-contained Docker image. It can be run disconnected from main installation environments.

#### Predicting {: #predicting data-category=predictions }
For non-time-series modeling. Use information in a row to determine the target for that row. Prediction uses explanatory variables to characterize expected outcomes or expected responses (e.g., a specific event in the future, gender, fraudulent transactions).

#### Prediction data {: #prediction-data data-category=mlops;predictions }
Data that contains prediction requests and results from the model.

#### Prediction environment {: #prediction-environment data-category=mlops;predictions }
An environment configured to manage deployment predictions on an external system, outside of DataRobot. Prediction environments allow you to configure deployment permissions and approval processes. Once configured, you can specify a prediction environment for use by DataRobot models running on the Portable Prediction Server and for remote models monitored by the MLOps monitoring agent.

#### Prediction Explanations {: #prediction-explanations data-category=modeling }
A visualization that helps to illustrate what drives predictions on a row-by-row basis&mdash;they provide a quantitative indicator of the effect variables have on a model, answering why a given model made a certain prediction. It helps to understand why a model made a particular prediction so that you can then validate whether the prediction makes sense. See also [SHAP](#shap-shapley-values), [XEMP](#xemp-exemplar-based-explanations-of-model-predictions).

#### Prediction intervals {: #prediction-intervals data-category=modeling }
Prediction intervals help DataRobot assess and describe the uncertainty in a single record prediction by including an upper and lower bound on a point estimate (e.g., a single prediction from a machine learning model). The prediction intervals provide a probable range of values that the target may fall into on future data points.

#### Prediction point {: #prediction-point data-category=modeling;data }
The point in time when you made or will make a prediction. Plan your prediction point based on the production model (for example, “one month before renewal” or “loan application submission time”). Once defined, create that entry in the training data to help avoid lookahead bias. With [Feature Discovery](fd-overview), you define the prediction point to ensure the derived features only use data prior to that point.

#### Primary dataset {: #primary-dataset data-category=data-prep }
(Feature Discovery) The dataset used to start a project.

#### Primary features {: #primary-features data-category=data-prep }
(Feature Discovery) Features in the project’s primary dataset.

#### Project {: #project data-category=modeling;data-prep }
A referenceable item that includes a dataset, which is the source used for training, and any models built from the dataset. Projects can be created and accessed from the home page, the project control center, and the AI Catalog. They can be shared to users, groups, and an organization.

{% include 'includes/genai/prompt-include.md' %}

#### Protected class {: #protected-class data-category=modeling }
One categorical value of the protected feature, used in bias and fairness modeling.

#### Protected feature {: #protected-feature data-category=modeling }
The dataset column to measure fairness of model predictions against. Model fairness is calculated against the protected features from the dataset. Also known as “protected attribute.”

## Q
-----------

#### Quick (Autopilot) {: #quick-autopilot data-category=modeling }
A shortened version of the full Autopilot modeling mode that runs models directly at 64%. With Quick, the 16% and 32% sample sizes are not executed. DataRobot selects models to run based on a variety of criteria, including target and performance metric, but as its name suggests, chooses only models with relatively short training runtimes to support quicker experimentation.

## R
-----------

#### Rating Table {: #rating-table data-category=modeling }
A model Leaderboard tab ([Describe > Rating Table](rating-table)) where you can export the model's complete, validated parameters.

#### Real-time predictions {: #real-time-predictions data-category=predictions }
Method of making predictions when low latency is required. Use the Prediction API for real-time deployment predictions on a dedicated and/or a standalone prediction server.

#### Receiver Operating Characteristic Curve {: #receiver-operating-characteristic-curve }
See [ROC Curve](#roc-curve).

#### Regression {: #regression data-category=modeling }
A type of prediction problem that predicts continuous values (for example, 1.7, 6, 9.8…). See also [classification](#classification).

#### Regular data {: #regular-data data-category=time-aware;modeling }
Data is regular if rows in the dataset fall on an evenly spaced time grid (e.g., there’s one row for every hour across the entire dataset). See also [time step](#time step) and [semi-regular data](#semi-regular-data).

#### Relationships {: #relationships data-category=data-prep }
(Feature Discovery) Relationships between datasets. Each relationship involves a pair of datasets, and a join key from each dataset. A key comprises one or more columns of a dataset. The keys from both datasets are ordered, and must have the same number of columns. The combination of keys is used to determine how two datasets are joined.

#### Remote models {: #remote-models data-category=mlops }
Models running outside of DataRobot in external prediction environments, often monitored by the MLOps monitoring agent to report statistics back to DataRobot.

#### Repository {: #repository data-category=modeling }
A library of modeling blueprints available for a selected project (based on the problem type). These models may be selected and built by DataRobot and also can be user-executed.

#### ROC Curve {: #roc-curve data-category=modeling }
Also known as Receiver Operating Characteristic Curve. A visualization that helps to explore classification, performance, and statistics related to a selected model at any point on the probability scale. In DataRobot, the visualization is available from the Leaderboard.

#### Role {: #role data-category=data-prep }
Roles&mdash;Owner, Consumer, and Editor&mdash;describe the capabilities provided to each user for a given dataset. This supports the scenarios when the user creating a data source or data connection and the enduser are not the same, or there are multiple endusers of the asset.

## S
-----------

#### Sample size {: #sample-size data-category=modeling;data-prep }
The percentage of the total training data used to build models. The percentage is based on the selected modeling mode or can be user-selected.

#### Scoring {: #scoring data-category=predictions }
See [Model scoring](#model-scoring), [Scoring data](#scoring-data).

#### Scoring Code {: #scoring-code data-category=mlops;predictions }
A method for using DataRobot models outside of the application. It is available for select models from the Leaderboard as a downloadable JAR file containing Java code that can be used to score data from the command line.

An exportable JAR file, available for select models, that runs in Java. Scoring Code JARs contain prediction calculation logic identical to the DataRobot API&mdash;the code generation mechanism tests each model for accuracy as a part of the generation process.

#### Scoring data {: #scoring-data data-category=predictions }
Applying an algorithmic model built from a historical dataset to a new dataset in order to uncover practical insights. Common scoring methods are batch and real-time scoring. "Scored data" (also called "inference data") refers to the dataset being scored.

#### Seasonality {: #seasonality data-category=time-aware;modeling }
Repeating highs and lows observed at different times of year, within a week, day, etc. Periodicity. For example, temperature is very seasonal (hot in the summer, cold in the winter, hot during the day, cold at night). Applicable to time series modeling.

#### Secondary dataset {: #secondary-dataset data-category=data-prep }
(Feature Discovery) A dataset that is added to a project and part of a relationship with the primary dataset.

#### Secondary features {: #secondary-features data-category=data-prep }
(Feature Discovery) Features derived from a project’s secondary datasets.

#### Segmented analysis {: #segmented-analysis data-category=mlops }
A deployment utility that filters data drift and accuracy statistics into unique segment attributes and values. Useful for identifying operational issues with training and prediction request data.

#### Segmented modeling {: #segmented-modeling data-category=time-aware;modeling}
A method of modeling [multiseries](#multiseries) projects by generating a model for each segment. DataRobot selects the best model for each segment (the segment champion) and includes the segment champions in a single Combined Model that you can deploy.

#### Semi-regular data {: #semi-regular-data data-category=time-aware;modeling }
Data is semi-regular if most time steps are regular but there are some small gaps (e.g., business days, but no weekends). See also [regular data](#regular-data) and [time steps](#time-step).

#### Segment ID {: #segment-id data-category=time-aware;modeling }
A column in a dataset used to group series into segments for a multiseries project. A segment ID is required for the segmented modeling workflow, where DataRobot builds a separate model for each segment. See also [Segmented modeling](ts-segmented).

#### Series ID {: #series-id data-category=time-aware;modeling }
A column in a dataset used to divide a dataset into series for a multiseries project. The column contains labels indicating which series each row belongs to. See also [Multiseries modeling](multiseries).

#### Service health {: #service-health data-category=mlops }
A performance monitoring component for deployments that tracks metrics about a deployment’s ability to respond to prediction requests quickly and reliably. Useful for identifying bottlenecks and assessing prediction capacity.

#### SHAP (Shapley Values) {: #shap-shapley-values data-category=modeling }
A fast, open-source methodology for computing Prediction Explanations for tree-based, deep learning, and linear-based models. SHAP estimates how much each feature contributes to a given prediction differing from the average. It is additive, making it easy to see how much top-N features contribute to a prediction. See also [Prediction Explanations](#prediction-explanations), [XEMP](#xemp-exemplar-based-explanations-of-model-predictions).

#### Smart downsampling {: #smart-downsampling data-category=modeling;data-prep }
A technique to reduce total dataset size by reducing the size of the majority class, enabling you to build models faster without sacrificing accuracy. When enabled, all analysis and model building is based on the new dataset size after smart downsampling.

#### Snapshot {: #snapshot data-category=data-prep }
An asset created from a data source. For example, with a database it represents either the entire database or a selection of (potentially joined) tables, taken at a particular point in time. It is taken from a live database but creates a static, read-only copy of data. DataRobot creates a snapshot of each data asset type, while allowing you to disable the snapshot when importing the data.

#### Speed vs Accuracy {: #speed-vs-accuracy data-category=modeling }
A Leaderboard tab that generates an analysis plot to show the tradeoff between runtime and predictive accuracy and help you choose the best model with the lowest overhead.

#### Stability {: #stability data-category=modeling }
A model Leaderboard tab ([Evaluate > Stability](stability)) that provides an at-a-glance summary of how well a model performs on different backtests. The backtesting information in this chart is the same as that available from the [Model Info](#model-info) tab.

#### Stacked predictions {: #stacked-predictions data-category=predictions }
A method for building multiple models on different subsets of the data. The prediction for any row is made using a model that excluded that data from training. In this way, each prediction is effectively an “out-of-sample” prediction. See an example in the [predictions documentation](data-partitioning#what-are-stacked-predictions). Compare to ["in-sample"](#in-sample-predictions) predictions.

#### Stationarity {: #stationarity data-category=time-aware;modeling }
The mean of the series does not change over time.  A stationary series does not have a trend or seasonal variation. Applicable to time series modeling. See also [trend](#trend).

#### Supervised learning {: #supervised-learning data-category=modeling }
Machine learning using labeled data, meaning that for each record, the dataset contains a known value for the target feature. By knowing the target during training, the model can "learn" how other features relate to the target and make predictions on new data. See also [unsupervised learning](#unsupervised-learning).

{% include 'includes/genai/system-prompt-include.md' %}

## T
-----------

#### Target {: #target data-category=modeling;data-prep }
The name of the column in the dataset that you would like to predict.

#### Target leakage {: #target-leakage data-category=modeling;data-prep }
An outcome when using a feature whose value cannot be known at the time of prediction (for example, using the value for “churn reason” from the training dataset to predict whether a customer will churn). Including the feature in the model’s feature list would incorrectly influence the prediction and can lead to overly optimistic models.

#### Task {: #task data-category=modeling }
An ML method, for example a data transformation such as one-hot encoding, or an estimation such as an XGBoost classifier, which is used to define a blueprint. There are hundreds of built-in tasks you can use, or you can define your own (custom) tasks.

#### Time series {: #time-series data-category=time-aware;modeling }
A series of data points indexed in time order. Ordinarily a sequence of measurements taken at successive, equally spaced intervals. [Time series modeling](time/index) is a recommended practice for data science problems where conditions may change over time.

#### Time series analysis {: #time-series-analysis data-category=time-aware;modeling }
Methods for analyzing time series data in order to extract meaningful statistics and other characteristics of the data.

#### Time series forecasting {: #time-series-forecasting data-category=time-aware;modeling }
The use of a model to predict future values based on previously observed values. In practice, a forecasting model may combine time series features with other data.

#### Time step {: #time-step data-category=time-aware;modeling }
The detected median time delta between rows in the time series; DataRobot determines the time unit. The time step consists of a number and a time-delta unit, for example (15, “minutes”). If a step isn’t detected, the dataset is considered irregular and time series mode may be disabled. See also [regular data](#regular-data) and [semi-regular data](#semi-regular-data).

{% include 'includes/genai/token-include.md' %}

#### Tracking agent {: #tracking-agent }
See [MLOps agent](#mlops-agent).

#### Training {: #training data-category=modeling;data-prep }
The process of building models on data in which the target is known.

#### Training Dashboard {: #training-dashboard data-category=modeling }

A model Leaderboard tab ([Evaluate > Training dashboard](training-dash)) that provides, for each executed iteration, information about a model's training and test loss, accuracy, learning rate, and momentum to help you get a better understanding about what may have happened during model training.

#### Training data {: #training-data data-category=modeling;data-prep }
The portion (partition) of data used to build models. See also [validation](#validation), [cross-validation](#cross-validation), and [holdout](#holdout).

#### Transfer learning {: #transfer-learning data-category=modeling data-category=modeling }
A project training on one dataset, extracting information that may be useful, and applying that learning to another.

#### Trend {: #trend data-category=time-aware;modeling }
An increase or decrease over time. Trends can be linear or non-linear and can show fluctuation. A series with a trend is not [stationary](#stationary).

#### Tuning {: #tuning data-category=modeling }
A trial-and-error process by which you change some hyperparameters, run the algorithm on the data again, then compare performance to determine which set of hyperparameters results in the most accurate model. In DataRobot, this functionality is available from the Advanced Tuning tab.

## U
-----------

#### Unit of analysis {: #unit-of-analysis }
(Machine learning) The unit of observation at which you are making a prediction.

#### Unlimited multiclass {: #unlimited-multiclass }
See [classification](#classification).

#### Unmaterialized {: #unmaterialized data-category=data-prep }
Data that DataRobot samples for profile statistics, but does not keep. Instead, the catalog stores a pointer to the data and only pulls it upon user request at project start or when running batch predictions. See also [materialized](#materialized) data.

{% include 'includes/genai/unstructured-include.md' %}

#### Unsupervised learning {: #unsupervised-learning data-category=modeling }
The ability to infer patterns from a dataset without reference to known (labeled) outcomes and without a specified target. Types of unsupervised learning include anomaly detection, outlier detection, novelty detection, and clustering. With anomaly detection, DataRobot applies unsupervised learning to detect abnormalities in a dataset. With clustering, DataRobot uses unsupervised learning to discern natural groupings in the data. See also [supervised](#supervised) learning.

#### User blueprint {: #user-blueprints data-category=modeling }
A blueprint (and extra metadata) that has been created by a user and saved to the AI Catalog, where it can be both shared and further modified. This is not the same as a blueprint available from the Repository or via models on the Leaderboard, though both can be used as the basis for creation of a user blueprint. See also [blueprint](#blueprint).

## V
-----------

#### Validation {: #validation data-category=modeling }
The validation (or testing) partition is a subsection of data that is withheld from training and used to evaluate a model’s performance. Since this data was not used to build the model, it can provide an unbiased estimate of a model’s accuracy. You often compare the results of validation when selecting a model. See also [cross-validation](#cross-validation).

#### Variable {: #variable }
See [feature](#feature).

{% include 'includes/genai/vdb-include.md' %}

#### Visual AI {: #visual-ai data-category=modeling }
DataRobot's ability to combine supported image types, either alone or in combination with other supported feature types, to create models that use images as input. The feature also includes specialized insights (e.g., image embeddings, activation maps, neural network visualizer) to help visually assess model performance.

## W
-----------

#### Word Cloud {: #word-cloud data-category=modeling }

A model Leaderboard tab ([Understand > Word Cloud](word-cloud)) that displays the most relevant words and short phrases in word cloud format.

#### Worker {: #worker data-category=modeling }
The processing power behind the DataRobot platform, used for creating projects, training models, and making predictions. They represent the portion of processing power allocated to a task. DataRobot uses different types of workers for different phases of the project workflow, including DSS workers (Dataset Service workers), EDA workers, secure modeling workers, and quick workers.

## X
-----------

#### XEMP (eXemplar-based Explanations of Model Predictions) {: #xemp-exemplar-based-explanations-of-model-predictions data-category=modeling }
A methodology for computing Prediction Explanations that works for all models. See also [Prediction Explanations](#prediction-explanations), [SHAP](#shap-shapley-values).

## Z
-----------

#### Z Score {: #z-score data-category=modeling }
A metric measuring whether a given class of the protected feature is “statistically significant” across the population. used in bias and fairness modeling.
